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Collections of journal papers, often referred to as 'citation networks', can be modeled as a collec- 
tion of coupled bipartite networks which tend to exhibit linear growth and preferential attachment 
as papers are added to the collection. Assuming primary nodes in the first partition and secondary 
nodes in the second partition, the basic bipartite Yule process assumes that as each primary node is 
added to the network, it links to multiple secondary nodes, and with probability, a, each new link 
may connect to a newly appearing secondary node. The number of links from a new primary node 
follows some distribution that is a characteristic of the specific network. Links to existing secondary 
nodes follow a preferential attachment rule. With modifications to adapt to specific networks, bi- 
partite Yule processes simulate networks that can be validated against actual networks using a wide 
variety of network metrics. The application of bipartite Yule processes to the simulation of paper- 
reference networks and paper-author networks is demonstrated and simulation results are shown to 
mimic networks from actual collections of papers across several network metrics. 
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COLLECTIONS OF PAPERS AS COUPLED 
BIPARTITE NETWORKS 



As shown in Figure 1, a collection of journal papers 
constitutes a series of coupled bipartite networks |g. As 
diagrammed in Figure 1, a collection of papers contains 6 
direct bipartite networks: 1) papers to paper authors, 2) 
papers to references, 3) papers to paper journals, 4) pa- 
pers to terms, 5) references to reference authors, and 6) 
references to reference journals. Additionally, there are 
15 indirect bipartite networks in collections of papers as 
defined by the diagram. Examples of interesting indirect 



networks are paper author to reference author networks, 
and paper journal to reference journal networks, which 
can be used for author co-citation analysis ^jj and jour- 
nal co-citation analysis [Bj respectively. 

Modeling the growth of these bipartite networks helps 
characterize the underlying processes driving a research 
specialty, such as knowledge accretion, researcher pro- 
ductivity, or collaboration processes. Bipartite growth 
models produce many network metrics, allowing compre- 
hensive validation of models against real collections of 
papers. 
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FIG. 1: Diagram showing a collection of papers as a series of 
coupled bipartite networks. 



II. BASIC BIPARTITE YULE PROCESSES 

As originally proposed, Yule processes do not model 
networks, but simply model the formation of power-laws 
of frequencies of items pj 01 ■ For a bipartite Yule 
process, assume a bipartite network where nodes fall 
into two partitions: 1) primary nodes and 2) secondary 
nodes. Typically, primary nodes are papers while sec- 
ondary nodes are entities that are associated with papers, 
such as authors, references, journals, or terms. 

Figure 2 shows a diagram of a bipartite paper-reference 
network, where the primary nodes are papers and the 
secondary nodes are references, and papers are linked to 
references by citations. 

Figure 3 shows a diagram of a basic bipartite Yule 
process: 

• The network grows by adding primary nodes one 
at a time. 
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• When a new primary node is added, it links to N 
secondary nodes. N is a random deviate drawn 
from a discrete probability distribution that is a 
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node, the linked node is selected using preferential 
attachment, that is, the probability of linking to 
a secondary node is proportional to the number of 
links that the node possesses. 

The stationary distribution of the link degree of the 
secondary nodes is a Yule distribution 0E2> a power 
law whose exponent is 1 + 1/(1 — a). The stationary 
distribution is independent of the distribution of N, but 
for finite collections of papers the distribution of N pro- 
foundly affects the tail of the distribution . 



III. PRACTICAL BIPARTITE YULE 
PROCESSES 

In practice, the basic bipartite Yule process outlined 
in the proceeding section must be modified to account 
for the characteristics of the specific type of bipartite 
network being studied. 



FIG. 2: Diagram showing a bipartite network of papers and 
the references that they cite. 
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FIG. 3: Diagram of a basic bipartite Yule process. 



characteristic of the type of network being mod- 
eled. For paper-reference networks N is lognor- 
mally distributed 0, while for paper-author net- 
works N is 1-shifted Poisson distributed Q ■ F° r 
paper-journal networks, N is unity, since a paper is 
only linked to one journal, the one in which it was 
published. As defined here, a primary entity does 
not link to any specific secondary entity more than 
once. 

• For each of the N links, there is a probability, a, 
that it will link to a newly appearing secondary 
node. 

• If a link happens to be to an existing secondary 



A. Paper-reference Yule process 

Figure 4 shows a diagram of a bipartite Yule process 
modified for the characteristics of paper-reference net- 
works. The details of this model, its scope, and a discus- 
sion of evidence of the its validity, appear in jjj . Paper- 
reference networks in collections of papers covering sci- 
entific specialties are characterized by the accretion of 
highly cited exemplar references, which are cited at rates 
far higher than would be predicted by simple preferential 
attachment. These exemplar references tend to appear 
during the initial growth of the network and their rate of 
appearance decreases exponentially as papers are added 
to the collection. 

As each paper is added to the collection, it links to 
a lognormally distributed number of references, as dis- 
cussed in |(|. For each reference cited by a paper, there 
is a probability a that the citation is to a newly appear- 
ing reference. When a new reference appears, there is a 
small probability that the reference will be a highly at- 
tractive exemplar reference. If so, the reference receives a 
large initial attraction, Aq. Newly created non-exemplar 
references received no initial attraction. If a citation is to 
an existing reference, the probability that any particular 
existing reference will be cited is proportional to the sum 
of its attraction plus the number of times it has been 
cited. A specific reference can not be cited more than 
once by a paper. 



B. Paper-author Yule process 

Figure 5 shows a diagram of the basic bipartite Yule 
process modified for the characteristics of paper-author 
networks. The details of this model, its scope, and a dis- 
cussion of evidence of the its validity, appear in and 
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FIG. 4: Diagram showing a bipartite Yule process for paper- 
reference networks. 



7J. In this case the Yule process is applied to teams of 
researchers rather than individual researchers. As each 
paper is added, there is a probability that the paper will 
be authored by a new research team. If so, a team of Nq 
authors is added to the network, but only N(X) appear 
as authors of the team's first paper, where iV(A) is a ran- 
dom deviate drawn from a 1-shifted Poisson distribution 
whose parameter is A. If choosing an existing team, the 
teams are chosen using preferential attachment, that is, 
the probability that a team will author the new paper is 
proportional to the number of papers that the team has 
previously published. 

When selecting authors for an existing team's paper, 
N(X) authors are chosen and the authors are selected us- 
ing preferential attachment, specifically, the probability 
of selecting an author is proportional to 1 plus the num- 
ber of papers that the author has published. Inter-team 
collaborations (weak ties) are modeled as random events; 
when an existing author is to be selected there is a prob- 
ability P that the author will be drawn randomly from 
some other team. 



IV. NETWORK METRICS 

Simulation using a bipartite Yule process fully pre- 
serves the topology of the network phenomenon being 
studied. The adjacency matrix for a bipartite network is 
a roughly lower triangular rectangular matrix. Figure 6 
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FIG. 5: Diagram showing a bipartite Yule process for paper- 
author networks. 



shows the adjacency matrices of the paper-reference net- 
work, paper-author network, and paper-journal network 
in an actual collection of papers. 

From each bipartite network, two co-occurrence net- 
works can be derived with their own characteristic topol- 
ogy. For example, a paper-reference network yields two 
unipartite networks, a bibliographic coupling network of 
papers linked by common references and a co-citation 
network of references linked by their common papers. A 
paper-author network yields a collaboration network of 
authors connected by common papers and also a network 
of papers connected by common authors. 

Network metrics that characterize a bipartite network 
can be derived from link degree distributions in the bi- 
partite network and link degree distributions in the asso- 
ciated unipartite co-occurrence networks. Many of these 
metrics can be tied to indicators of the underlying re- 
search process generating the collection of papers. 

A set of useful metrics for paper-reference networks 
includes: 

• reference per paper distribution - This tends to be 
a lognormal distribution whose mean, m, is from 
15 to 30 references per paper 0. 

• paper per reference distribution - This tends to be 
a power-law distribution with a characteristic ex- 
ponent that ranges from 2 to 4 
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FIG. 6: Diagrams of adjacency matrices of bipartite networks in a collection of 902 papers on the topic of complex networks. 



• bibliographic coupling strength per paper pair distri- 
bution - This is the link weight distribution of the 
bibliographic coupling network. 

• co- citation coupling strength per reference pair dis- 
tribution - This is the link weight distribution of 
the co-citation network. 

• bibliographic coupling clustering coefficient distri- 
bution - This the distribution of the clustering co- 
efficients for the bibliographic coupling network. 

In paper-reference networks, the mean references per pa- 
per is typically about 30, while the mean papers per ref- 
erence is typically about 1.4, the mean of a zeta (pure 
power-law) distribution with exponent of 3. This con- 
strains the ratio of references to papers in the collection 
to be about 20, that is, a collection of papers typically 
has about 20 times more references than papers. 

A set of useful metrics for paper-author networks in- 
cludes. 

• authors per paper distribution - This tends to be 
a 1-shiftcd Poisson distribution whose mean varies 
from 2 for fields such as mathematics to more than 
10 for biomedical fields 0- 

• paper per author distribution - This tends to be a 
power-law (Lotka's Law), whose exponent ranges 
from 2 to 4 0. 

• collaborating author distribution - This is the dis- 
tribution of the number of unique co-authors per 
author in the collection, and is the link degree dis- 
tribution of the unweighted co-authorship network. 

• co-authorship per author pair distribution - This 
is the link weight distribution of the weighted co- 
authorship network. 

• co-authorship clustering coefficient distribution - 
This is the clustering coefficient of the unweighted 
co-authorship network. 



• minimum co-authorship path length distribution 
- This is the distribution of minimum path- 
lengths between author pairs in the unweighted co- 
authorship network. 



V. EXAMPLES 
A. Example simulation of paper-reference network 

The Yule model for paper-reference networks was 
tested on a collection of papers that cover the topic 
of complex networks. This collection was gathered on 
September 8th, 2003 from ISI's Web of Science product 
using a series of queries to find all papers that cite key 
references and authors in the specialty. The collection 
contains 902 papers with 31355 citations to 19185 refer- 
ences. The Yule parameter, a, estimated by dividing the 
number of references by the number of citations to ref- 
erences, is 0.61. The mean references per paper is 34.8. 
The parameters used for the bipartite Yule simulation of 
this collection can be found in [6j. 

Figure 7 show plots comparing network metrics from 
the actual data to a Yule simulation of network growth. 
The upper left plot is of papers per reference frequen- 
cies. Maximum likelihood expectation (MLE) estimated 
power-law exponents are 3.0 for the actual frequencies, 
and 2.85 for the simulation. The paper-reference Yule 
process mimics the phenomenon of exceptionally highly 
cited exemplar references in the extreme lower right of 
the plot. The upper right plot is of frequency of bib- 
liographic coupling strength per paper pair. The Yule 
process-based simulation frequencies match the actual 
frequencies well. The series of high bibliographic cou- 
pling strength pairs in the lower right from actual data 
corresponds to pairs of review papers with long lists of 
almost identical references, a phenomenon not modeled 
by the Yule process. The lower left plot of Figure 7 is of 
frequency of co-citation strength per reference pair. The 
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FIG. 7: Comparison plots of paper per reference frequency (upper left), bibliographic coupling strength frequency (upper right), 
co-citation strength frequency (lower left), and bibliographic coupling clustering coefficient distribution (lower right), from a 
collection of 902 papers on the topic of complex networks. 



simulated frequencies match the actual frequencies well 
across the whole plot. The lower right plot is of biblio- 
graphic coupling clustering coefficient distribution. The 
simulated distribution matches the shape and scale of the 
actual data. 



B. Example simulation of a paper-author network 

The Yule model for paper-author networks was tested 
on three collections of papers representing specialties 
with a wide range of collaboration intensities. A col- 
lection of 1391 papers on the topic of distance learning 
with 51% single-authored papers represents a specialty 
with little collaboration. A collection of 900 papers on 
the topic of complex networks with 21% single-authored 
papers represents a specialty with typical amount of col- 
laboration. Finally, a collection of 3095 papers on the 
topic of atrial ablation with 7% single-authored papers 



represents a specialty with heavy collaboration 7]. The 
parameters used for bipartite Yule simulation of these 
paper-author networks can be found in jjj. 

Figures 8, 9 and 10 show the comparison of Yule model 
simulations to actual data for these three collections us- 
ing two metrics: 1) paper per author frequency (Lotka's 
Law), and 2) collaborating author frequency. 

The left plots in Figures 8, 9 and 10 are paper per 
author frequency plots. The bipartite Yule process pro- 
duces excellent matches to actual data. The inset plots 
show Yule model predicted paper per author distribu- 
tions derived by gathering statistics from 1000 simula- 
tions for each collection. A line representing an MLE fit- 
ted zeta (pure power-law) distribution is shown in each 
inset. The Yule model produces excellent fits to the zeta 
distribution for all three collections, confirming the Yule 
model's usefulness as a predictor of Lotka's Law. Note 
that the deviation of the distributions from the zeta dis- 
tribution in the tail of the distributions is due to trun- 
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FIG. 8: Comparison of bipartite Yule simulation against actual data for plots of paper per author frequencies and collaborating 
author frequencies for the distance education paper collection. 




FIG. 9: Comparison of bipartite Yule simulation against actual data for plots of paper per author frequencies and collaborating 
author frequencies for the complex networks paper collection. 



eating the simulations at the number of papers in each 
collection. The plots on the right side of Figures 8, 
9 and 10 show that the bipartite Yule model produces 
good matches of collaborating author frequencies to ac- 
tual data across the wide rage of collaboration intensities 
represented by the three collections. 



works. Figure 10 shows an example of coupled bipartite 
networks, where a paper-author network is coupled to a 
paper reference network through common papers. The 
challenge is to invent a model that reproduces the cor- 
relation of groups of authors to groups of references, a 
phenomenon that cannot be modeled using two separate 
bipartite processes. 



VI. FUTURE WORK 



The research on bipartite Yule processes discussed here 
will be extended to modeling of coupled bipartite net- 
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FIG. 10: Comparison of bipartite Yule simulation against actual data for plots of paper per author frequencies and collaborating 
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through common papers. 
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